Creating
and documenting processes that detail how to properly back up and
recover from a disaster is an essential step in a disaster recovery
project. Equally important as creating
these processes is periodically reviewing, validating, and updating the
processes. Disaster recovery planning should not be considered a
project for the current calendar year; instead, it should be considered
an essential part of regular business operations and should have
dedicated annual budget and assigned staff.
Each year, many
businesses, business divisions, or departments update their computer
and network infrastructure and change the way they provide services to
their staff, vendors, and clients. In many of these cases, the
responsible information technology staff, cross-departmental managers,
executives, and employees are not involved or properly informed in
advance of the execution or implementation of these changes. Computer
and network infrastructure changes can have ripple effects throughout
an entire organization during transition and during disaster and
failure situations, so proper planning and approval of changes should
always be performed and documented.
To reduce the risk of a change
negatively impacting business operations, many organizations implement
processes that require new projects and system changes to be submitted,
evaluated, and either approved or rejected based on the information
provided. Although this chapter does not focus or even really discuss
project management, all organizations that utilize computer and network
infrastructures should consider implementing a Project Management
Office and a change-control committee to review and oversee
organizational projects and infrastructure changes.
Project Management Office (PMO)
In recent years,
many organizations have introduced Project Management Offices (PMOs)
into their business operations. A PMO is used to provide somewhat of a
project oversight committee to organizations that frequently operate
several projects simultaneously. Organizations that utilize a proven
project methodology can further extend this methodology to include
workflow processes that include checkpoints with the PMO staff.
The role of the PMO can be
different in almost every organization, but most include a few key
functions. The role of the PMO usually involves reviewing proposed
projects to determine how or if the project deliverables coincide with
the organization’s current or future business plans or strategies. PMO
membership can also be very different among organizations. PMO
membership can include departmental managers, directors or team leads,
executive staff, employee advocates, and, in some cases, board members.
Having the PMO staff represent views and insight from the different
levels and departments of an organization enables the PMO to add value
to any proposed project.
Having diverse staff
included in the PMO staff enables the organization to evaluate and
understand current and proposed projects and how these projects will
positively or negatively affect the organization as a whole. Some of
the general functions or roles a PMO can provide include the following:
High-level project visibility—
All proposed projects are presented to the PMO and if approved, the
project is tracked by the PMO. This provides a single entity that is
knowledgeable and informed about all ongoing and future projects in an
organization and how they align to business and technical objectives.
Project sounding board— When
a new project is proposed or presented to the PMO, the project will be
scrutinized and many questions will be asked. Some of these questions
might not have been considered during the initial project design and
planning phases. The PMO improves project quality by constantly
reviewing and monitoring projects from when the project is proposed and
during regular scheduled project status and PMO meetings.
Committee-based project approval or denial—
The PMO is informed of all the current and future projects, as well as
business direction and strategy, and is the best-equipped group to
decide on whether a project should be approved, denied, or postponed.
Enterprise project management—
The PMO tracks the status of all ongoing projects and upcoming
projects, which enables the PMO to provide additional insight and
direction with regard to internal resource utilization, vendor
management for outsourced projects, and, of course, project budget and
scheduling.
Change Control
Whereas a PMO improves project
management and can provide the necessary checkpoints to verify that
backup and recovery requirements are addressed within the new projects,
an organization with a change-control system can ensure that any
proposed changes have been carefully evaluated and scheduled before
approval or change execution. Change control involves a submittal,
review, and approval process for each change that typically includes
the following information:
Change description— Includes which systems will be changed, what the change is, and why it is proposed or required.
Impact of the change—
Details if any systems or services will be unavailable during the
execution of the change and who will be affected or impacted by the
change.
Change duration— Details how long it will take to execute and complete the change and, if necessary, revert or roll back the change.
Change schedule— Includes the proposed date and time to execute the change.
Change procedure—
Details how the change will be executed, including a detailed
description; this usually also includes detailed steps or an
accompanying document.
Change rollback plan— Details the steps necessary to recover or roll back the change in the event that the change causes undesirable results.
Change owners—
Includes who will execute the change and is responsible for
communicating the status and results of the change back to the
change-control committee.
A change-control
committee, similar to a PMO, is made up of managers, executives, and
employee advocates who will review and determine if the change is
approved, denied, or needs to be postponed. Proposed changes are
submitted in advance. A day or two later, a change-control
review meeting is held where each change is discussed by the
change-control committee and the change owner, and the change will be
approved, denied, postponed, or closed, or more information will be
requested.
During failure or
disaster situations, going through the normal change-control process
might not be an option due to the impact of the failure. During these
situations, emergency change-request processes should be followed. An
emergency change request usually involves getting the particular
departmental manager and the responsible information technology
manager, director, or CIO to sign off on the change before it is
executed. In short, all changes need to be considered and approved,
even in failure scenarios when time is of the essence. When an
administrator is troubleshooting and trying to resolve a failure or
trying to recover from a disaster, especially in a stressful situation,
making changes without getting approval can lead to costly mistakes.
Following the proper change-control and emergency change-control
processes to inform and involve others, getting approval from
management, and following documented processes will provide
accountability and might even save the administrator’s job.
Disaster Recovery Delegation of Responsibilities
At this point, the
organization might have a documented and functional backup and recovery
plan, a PMO, and a change-control committee, but the ownership and
maintenance of disaster recovery operations is not yet defined or
assigned. Disaster recovery roles, functions, or responsibilities might
be wrapped up into an existing executive’s or manager’s duties or a
dedicated staff member might be required. Commonly, disaster recovery
responsibilities are owned by the chief information officer, operations
manager, chief information security officer, or a combination of these
positions. Of course, responsibilities for different aspects of the
overall disaster recovery plan are delegated to managers, departmental
leads, and staff volunteers as necessary. An example of delegating
disaster recovery responsibilities is contained in the following list:
The chief
information officer is responsible for disaster recovery planning and
maintaining and executing disaster recovery-related tasks for the
entire telecom, desktop and server computer infrastructure, network
infrastructure, and all other electronic and fax-related communication.
The
manager of facilities or operations is responsible for planning
alternate office locations and offsite storage of original or
duplicates of all important paper documents, such as leases, contracts,
insurance policies, stock certificates, and so on, to support disaster
recovery operations to alternate sites or offices.
The
manager of human resources is responsible for creating and maintaining
emergency contact numbers for the entire company, storing this
information offsite, and communicating with employees to provide
direction and information prior to disasters striking and during a
disaster recovery operation.
The list of
responsibilities can be very granular and extensive and disaster
recovery planning should not be taken lightly or put on the back
burner. Although there are many aspects of disaster recovery planning,
the remainder of this chapter focuses only on the disaster
recovery responsibilities and tasks that should be assigned to
qualified Windows administrators who need to support a Windows Server
2008 R2 environment.
Achieving 99.999% Uptime Using Windows Server 2008 R2
When the topic of
disaster recovery comes up, many people think of the phrase “five
nines” or “99.999% uptime.” Although understanding this concept is
reasonably simple, actually providing five nines for a server or a
network can be quite a large and expensive task. Achieving 99.999%
uptime means that the server, application, network, or whatever is
supposed to have this amount of uptime can only be down for just over
five minutes per year. Having such success is quite a claim to make, so
administrators should make it with caution and document it, citing
explicitly what this service depends on. For example, if a power
failure occurs and the battery backups will last only two hours, a
dependency for a server could be that if a power outage occurs, it can
withstand up to two hours without power.
To provide 99.999% uptime for
services available on Windows Server 2008 R2, administrators can build
in redundancy and replication on a data, service, server, or site
level. Many Windows Server 2008 R2 services outlined in other chapters
of this book, including Failover Clusters, Network Load Balancing, and
the Distributed File System, can provide redundancy for the specific
services available.